Access public data using R in Oracle
AFSC Oracle users can access the database via SQL developer to view and pull the production data directly from the GAP_PRODUCTS Oracle schema.
Many users will want to access the data from Oracle using R. The user will need to install the RODBC R package and ask OFIS (IT) connect R to Oracle. Then, use the following code in R to establish a connection from R to Oracle:
Here, the user can write in their username and password directly into the RODBC connect function. Never save usernames or passwords in scripts that may be intentionally or unintentionally shared with others. If no username and password is entered in the function, pop-ups will appear on the screen asking for the username and password.
If the user has access to the AFSC Oracle database, the user can use SQL developer to view and pull the GAP Products data directly from the GAP_PRODUCTS Oracle schema.
Many users will want to access the data from Oracle using R. The user will need to install the RODBC R package and ask OFIS (IT) connect R to Oracle. Then, use the following code in R to establish a connection from R to Oracle:
Here, the user can establish the oracle connection by entering their username and password in the channel <- gapindex::oracle_connect() function. Never save usernames or passwords in scripts that may be intentionally or unintentionally shared with others. If no username and password is entered in the function, pop-ups will appear on the screen asking for the username and password.
You can download all of the tables locally using a variation of the code below. Once connected, pull and save the tables of interest into the R environment.
locations <- c(
"GAP_PRODUCTS.CPUE",
"GAP_PRODUCTS.BIOMASS",
"GAP_PRODUCTS.AGECOMP",
"GAP_PRODUCTS.SIZECOMP",
"GAP_PRODUCTS.STRATUM_GROUPS",
"GAP_PRODUCTS.AREA_ID",
"GAP_PRODUCTS.DESIGN_TABLE",
"GAP_PRODUCTS.TAXONOMICS_WORMS",
"GAP_PRODUCTS.TAXONOMICS_ITIS",
"GAP_PRODUCTS.TAXONOMIC_CONFIDENCE",
"GAP_PRODUCTS.METADATA_COLUMN"
)
for (i in 1:length(locations)) {
print(locations[i])
a <- RODBC::sqlQuery(channel, paste0("SELECT * FROM ", locations[i]))
write.csv(x = a, file = here::here("data", paste0(locations[i], ".csv")))
}
Biomass and abundance for Pacific Ocean perch from 1990 – 2023 for the western/central/eastern GOA management areas as well as for the entire region.
dat <- RODBC::sqlQuery(channel = channel,
query =
"WITH FILTERED_STRATA AS (
SELECT AREA_ID, DESCRIPTION FROM GAP_PRODUCTS.AREA
WHERE TYPE in ('REGULATORY AREA', 'REGION')
AND SURVEY_DEFINITION_ID = 47)
SELECT
BIOMASS_MT,
POPULATION_COUNT,
YEAR,
DESCRIPTION
FROM GAP_PRODUCTS.BIOMASS BIOMASS
JOIN FILTERED_STRATA STRATA
ON STRATA.AREA_ID = BIOMASS.AREA_ID
WHERE BIOMASS.SURVEY_DEFINITION_ID IN 47
AND BIOMASS.SPECIES_CODE = 30060")
dat0 <- dat %>%
janitor::clean_names() %>%
dplyr::select(biomass_mt, population_count, year, area = description) %>%
pivot_longer(cols = c("biomass_mt", "population_count"),
names_to = "var",
values_to = "val") %>%
dplyr::mutate(
val = ifelse(var == "biomass_mt", val/1e6, val/1e9),
var = ifelse(var == "biomass_mt", "Biomass (Mmt)", "Population (B)"),
area = gsub(x = area, pattern = " - ", replacement = "\n"),
area = gsub(x = area, pattern = ": ", replacement = "\n"),
type = sapply(X = strsplit(x = area, split = "\n", fixed = TRUE), `[[`, 2)) %>%
dplyr::arrange(type) %>%
dplyr::mutate(
area = factor(area, levels = unique(area), labels = unique(area), ordered = TRUE))
# install.packages("scales")
library(scales)
figure <- ggplot2::ggplot(
dat = dat0,
mapping = aes(x = year, y = val, color = type)) +
ggplot2::geom_point(size = 3) +
ggplot2::facet_grid(cols = vars(area), rows = vars(var), scales = "free_y") +
ggplot2::scale_x_continuous(name = "Year", n.breaks = 3) +
ggplot2::scale_y_continuous(name = "Estimate", labels = comma) +
ggplot2::labs(title = 'GOA Pacific Ocean perch biomass and abundance 1990 – 2023') +
ggplot2::guides(color=guide_legend(title = "Region Type"))+
ggplot2::scale_color_grey() +
ggplot2::theme_bw() +
ggplot2::theme(legend.direction = "horizontal",
legend.position = "bottom")
flextable::flextable(head(dat))
BIOMASS_MT | POPULATION_COUNT | YEAR | DESCRIPTION |
|---|---|---|---|
110,656.50 | 276,006,739 | 1,984 | EASTERN GOA - INPFC |
100,542.22 | 183,617,817 | 1,987 | EASTERN GOA - INPFC |
101,678.29 | 177,314,827 | 1,990 | EASTERN GOA - INPFC |
151,580.88 | 235,121,936 | 1,993 | EASTERN GOA - INPFC |
212,035.74 | 384,342,830 | 1,996 | EASTERN GOA - INPFC |
77,115.84 | 192,233,028 | 1,999 | EASTERN GOA - INPFC |
figure
Northern and Southern rock sole size composition data from 1991 – 2022 for the Aleutian Islands, with Ridge plot from ggridges.
dat <- RODBC::sqlQuery(channel = channel,
query =
"WITH FILTERED_STRATA AS (
SELECT
AREA_ID,
DESCRIPTION
FROM GAP_PRODUCTS.AREA
WHERE TYPE = 'REGION'
AND SURVEY_DEFINITION_ID = 52)
SELECT
LENGTH_MM,
YEAR
FROM GAP_PRODUCTS.SIZECOMP SIZECOMP
JOIN FILTERED_STRATA STRATA
ON STRATA.AREA_ID = SIZECOMP.AREA_ID
WHERE SIZECOMP.SURVEY_DEFINITION_ID IN 52
AND SIZECOMP.SPECIES_CODE IN (10261, 10262)")
dat0 <- dat %>%
janitor::clean_names() %>%
dplyr::mutate(length_cm = length_mm/10)
# install.packages("ggridges")
library(ggridges)
figure <-
ggplot2::ggplot(
data = dat0,
mapping = aes(x = length_cm, y = as.factor(year), fill = stat(x))) +
ggridges::theme_ridges(center_axis_labels = TRUE) +
ggridges::geom_density_ridges_gradient(scale = 4, show.legend = FALSE) +
ggplot2::scale_y_discrete(name = "Year", expand = c(0.01, 0)) +
ggplot2::scale_x_continuous(name = "Length (cm)", expand = c(0.01, 0)) +
# ggplot2::scale_fill_grey() +
ggplot2::labs(title = 'AI Rock sole Size Compositions 1991 – 2022')
flextable::flextable(head(dat))
LENGTH_MM | YEAR |
|---|---|
110 | 1,997 |
130 | 1,997 |
140 | 1,997 |
150 | 1,997 |
160 | 1,997 |
170 | 1,997 |
figure
Walleye pollock age composition for the EBS Standard Area from 1982 – 2022 and the EBS + NW Area from 1987 – 2022, with age pyramid plot.
dat <- RODBC::sqlQuery(channel = channel,
query =
"WITH FILTERED_STRATA AS (
SELECT
AREA_ID,
DESCRIPTION
FROM GAP_PRODUCTS.AREA
WHERE TYPE = 'REGION' AND
SURVEY_DEFINITION_ID = 98)
SELECT
AGECOMP.AGE,
AGECOMP.POPULATION_COUNT,
AGECOMP.SEX
FROM GAP_PRODUCTS.AGECOMP AGECOMP
JOIN FILTERED_STRATA STRATA
ON STRATA.AREA_ID = AGECOMP.AREA_ID
WHERE SURVEY_DEFINITION_ID = 98
AND SPECIES_CODE = 21740
AND AGE >= 0")
dat0 <- dat %>%
janitor::clean_names() %>%
dplyr::filter(sex %in% c(1,2)) %>%
dplyr::mutate(
sex = ifelse(sex == 1, "M", "F"),
population_count = # change male population to negative
ifelse(sex=="M", population_count*(-1), population_count*1)/1e9)
figure <- ggplot2::ggplot(
data = dat0,
mapping =
aes(x = age,
y = population_count,
fill = sex)) +
ggplot2::scale_fill_grey() +
ggplot2::geom_bar(stat = "identity") +
ggplot2::coord_flip() +
ggplot2::scale_x_continuous(name = "Age") +
ggplot2::scale_y_continuous(name = "Population (billions)", labels = abs) +
ggplot2::ggtitle(label = "EBS Walleye Pollock Age Compositions 1982 – 2022") +
ggplot2::guides(fill = guide_legend(title = "Sex"))+
ggplot2::theme_bw()
flextable::flextable(head(dat))
AGE | POPULATION_COUNT | SEX |
|---|---|---|
1 | 137,398,782 | 1 |
2 | 1,040,787,296 | 1 |
3 | 1,183,060,368 | 1 |
4 | 1,661,729,359 | 1 |
5 | 507,355,476 | 1 |
6 | 70,450,613 | 1 |
figure
Pacific cod biomass and abundance data for the NBS by stratum.
dat <- RODBC::sqlQuery(channel = channel,
query =
"WITH FILTERED_STRATA AS (
SELECT
AREA_ID,
AREA_NAME,
DESCRIPTION
FROM GAP_PRODUCTS.AREA
WHERE TYPE in ('STRATUM') AND
SURVEY_DEFINITION_ID = 143)
SELECT
BIOMASS.BIOMASS_MT,
BIOMASS.POPULATION_COUNT,
BIOMASS.YEAR,
STRATA.AREA_NAME
FROM GAP_PRODUCTS.BIOMASS BIOMASS
JOIN FILTERED_STRATA STRATA
ON STRATA.AREA_ID = BIOMASS.AREA_ID
WHERE BIOMASS.SURVEY_DEFINITION_ID IN 143
AND BIOMASS.SPECIES_CODE = 21720")
dat0 <- dat %>%
janitor::clean_names() %>%
dplyr::select(biomass_mt, population_count, year, area = area_name) %>%
pivot_longer(cols = c("biomass_mt", "population_count"),
names_to = "var",
values_to = "val") %>%
dplyr::mutate(
val = ifelse(var == "biomass_mt", val/1e6, val/1e9),
var = ifelse(var == "biomass_mt", "Biomass (Mmt)", "Population (B)"),
area = factor(area, levels = unique(area), labels = unique(area), ordered = TRUE))
figure <- ggplot2::ggplot(
dat = dat0,
mapping = aes(y = val, x = year, fill = area)) +
ggplot2::geom_bar(position="stack", stat="identity") +
ggplot2::facet_grid(rows = vars(var), scales = "free_y") +
ggplot2::scale_y_continuous(name = "Estimate", labels = comma) +
ggplot2::scale_x_continuous(name = "Year", breaks = unique(dat0$year)) +
ggplot2::labs(title = 'NBS Pacific cod biomass and abundance by stratum') +
ggplot2::guides(fill=guide_legend(title = "Region Type"))+
ggplot2::scale_fill_grey() +
ggplot2::theme_bw() +
ggplot2::theme(legend.direction = "horizontal",
legend.position = "bottom")
flextable::flextable(head(dat))
BIOMASS_MT | POPULATION_COUNT | YEAR | AREA_NAME |
|---|---|---|---|
7,462.5586 | 4,724,153.5 | 2,010 | Inner Domain |
7,462.5586 | 4,724,153.5 | 2,010 | Inner Domain |
7,462.5586 | 4,724,153.5 | 2,010 | Inner Domain |
7,462.5586 | 4,724,153.5 | 2,010 | Inner Domain |
7,462.5586 | 4,724,153.5 | 2,010 | Inner Domain |
680.4357 | 250,836.5 | 2,010 | Middle Domain |
figure
Pacific Ocean perch biomass totals for GOA between 1984-2021 from GAP_PRODUCTS.BIOMASS
dat <- RODBC::sqlQuery(channel = channel,
query =
"SELECT
SURVEY_DEFINITION_ID,
BIOMASS_MT,
YEAR
FROM GAP_PRODUCTS.BIOMASS
WHERE SPECIES_CODE = 30060
AND SURVEY_DEFINITION_ID = 47
AND AREA_ID = 99903
AND YEAR BETWEEN 1984 AND 2021;") %>%
janitor::clean_names() %>%
dplyr::mutate(biomass_mt = biomass_mt/1000)
a_mean <- dat %>%
dplyr::group_by(survey_definition_id) %>%
dplyr::summarise(biomass_mt = mean(biomass_mt, na.rm = TRUE),
minyr = min(year, na.rm = TRUE),
maxyr = max(year, na.rm = TRUE))
figure <-
ggplot(data = dat,
mapping = aes(x = year,
y = biomass_mt)) +
ggplot2::geom_point(size = 2.5, color = "grey40") +
ggplot2::scale_x_continuous(
name = "Year",
labels = scales::label_number(
accuracy = 1,
big.mark = "")) +
ggplot2::scale_y_continuous(
name = "Biomass (Kmt)",
labels = comma) +
ggplot2::geom_segment(
data = a_mean,
mapping = aes(x = minyr,
xend = maxyr,
y = biomass_mt,
yend = biomass_mt),
linetype = "dashed",
size = 2) +
ggplot2::ggtitle(
label = "GOA Pacific Ocean Perch Biomass 1984-2021",
subtitle = paste0("Mean = ",
formatC(x = a_mean$biomass_mt,
digits = 2,
big.mark = ",",
format = "f"),
" Kmt")) +
ggplot2::theme_bw()
flextable::flextable(head(dat))
survey_definition_id | biomass_mt | year |
|---|---|---|
47 | 220.9105 | 1,984 |
47 | 241.4382 | 1,987 |
47 | 157.2951 | 1,990 |
47 | 483.6226 | 1,993 |
47 | 771.4128 | 1,996 |
47 | 727.0635 | 1,999 |
figure
Figure 1: Plot of POP biomass totals for GOA between 1984-2021 from GAP_PRODUCTS.BIOMASS.
akgfmaps mapPacific Ocean perch catch-per-unit-effort estimates for EBS in 2021 from GAP_PRODUCTS.CPUE and map constructed using akgfmaps. Here, we’ll use AKFIN HAUL and CRUISES data also included in this repo, for convenience, though they are very similar to their RACEBASE analogs.
dat <- RODBC::sqlQuery(channel = channel,
query =
"SELECT
(cp.CPUE_KGKM2/100) CPUE_KGHA, -- akgfmaps is expecting hectares
hh.LATITUDE_DD_START LATITUDE,
hh.LONGITUDE_DD_START LONGITUDE
FROM GAP_PRODUCTS.CPUE cp
-- Use HAUL data to obtain LATITUDE & LONGITUDE and connect to cruisejoin
LEFT JOIN GAP_PRODUCTS.AKFIN_HAUL hh
ON cp.HAULJOIN = hh.HAULJOIN
-- Use CRUISES data to obtain YEAR and SURVEY_DEFINITION_ID
LEFT JOIN GAP_PRODUCTS.AKFIN_CRUISES cc
ON hh.CRUISEJOIN = cc.CRUISEJOIN
WHERE cp.SPECIES_CODE = 30060
AND cc.SURVEY_DEFINITION_ID = 98
AND cc.YEAR = 2021;")
# devtools::install_github("afsc-gap-products/akgfmaps", build_vignettes = TRUE)
library(akgfmaps)
figure <- akgfmaps::make_idw_map(
x = dat, # Pass data as a data frame
region = "bs.south", # Predefined EBS area
set.breaks = "jenks", # Gets Jenks breaks from classint::classIntervals()
in.crs = "+proj=longlat", # Set input coordinate reference system
out.crs = "EPSG:3338", # Set output coordinate reference system
grid.cell = c(20000, 20000), # 20x20km grid
key.title = "Pacific Ocean perch") # Include in the legend title
[inverse distance weighted interpolation]
[inverse distance weighted interpolation]
CPUE_KGHA | LATITUDE | LONGITUDE |
|---|---|---|
0 | 56.66721 | -159.7800 |
0 | 56.98080 | -159.6926 |
0 | 56.97832 | -159.1519 |
0 | 57.31992 | -159.0614 |
0 | 57.32157 | -158.3811 |
0 | 57.65189 | -158.3673 |
figure$plot +
ggplot2::guides(fill=guide_legend(title = "Pacific Ocean perch\nCPUE (kg/km2)")) |>
change_fill_color(new.scheme = "grey", show.plot = FALSE)
This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.
Software code created by U.S. Government employees is not subject to copyright in the United States (17 U.S.C. §105). The United States/Department of Commerce reserve all rights to seek and obtain copyright protection in countries other than the United States for Software authored in its entirety by the Department of Commerce. To this end, the Department of Commerce hereby grants to Recipient a royalty-free, nonexclusive license to use, copy, and create derivative works of the Software outside of the United States.
U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries